AITopics | non-stationary mdp

Collaborating Authors

non-stationary mdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ABayesian Fast-Slow Framework to Mitigate Interference in Non-Stationary Reinforcement Learning

Neural Information Processing SystemsJun-19-2026, 07:51:53 GMT

Given the ever-changing nature of the world and its inhabitants, agents must possess the ability to adapt and evolve over time. Recent research in Given the ever-changing nature of the world and its inhabitants, agents must possess the ability to adapt and evolve over time. Recent research in non-stationary MDPs has focused on addressing this challenge, providing algorithms inspired by task inference techniques. However, these methods ignore the detrimental effects of interference, which particularly harm performance in contradictory tasks, leading to low efficiency in some environments. To address this issue, we propose a Bayesian Fast-Slow Framework (BFSF) that tackles both cross-task generalization and resistance to cross-task interference.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Neural Information Processing SystemsMar-22-2026, 17:01:36 GMT

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Towards Safe Policy Improvement for Non-Stationary MDPs

Neural Information Processing SystemsDec-24-2025, 03:23:09 GMT

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is stationary. However, many real-world problems of interest exhibit non-stationarity, and when stakes are high, the cost associated with a false stationarity assumption may be unacceptable. We take the first steps towards ensuring safety, with high confidence, for smoothly-varying non-stationary decision problems. Our proposed method extends a type of safe algorithm, called a Seldonian algorithm, through a synthesis of model-free reinforcement learning with time-series analysis. Safety is ensured using sequential hypothesis testing of a policy's forecasted performance, and confidence intervals are obtained using wild bootstrap.

name change, non-stationary mdp, safe policy improvement, (4 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)

Add feedback

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Neural Information Processing SystemsMay-27-2025, 19:26:24 GMT

We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a sleeping expert framework. The key hurdle is partial information due to MDP uncertainty.

artificial intelligence, machine learning, non-stationary concave utility reinforcement learning, (3 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Review for NeurIPS paper: Towards Safe Policy Improvement for Non-Stationary MDPs

Neural Information Processing SystemsJan-25-2025, 06:28:30 GMT

Summary and Contributions: In this paper, the authors introduce a novel model-free, policy improvement-based algorithm, for smooth non-stationary Markov decision processes (NS-MDP), focusing on safety guarantees of their method. The method relies heavily on Assumption 1 (Smooth performance), implicitly assumed in [51], which enables the treatment of the off-policy evaluation (OPE) in the NS-MDP as a time-series forecasting (TSF) problem. The authors introduce safe policy improvement for NS-MDPs, which they term SPIN, which, under Assumption 1, iterates between a policy evaluation step and a policy improvement step. Importance sampling is used for OPE according to past evaluation samples. Then TSF is applied to estimate future performance, while wild bootstrapping is used to obtain uncertainty estimates for future performance.

neurips paper, non-stationary mdp, safe policy improvement, (4 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.65)

Add feedback

A New Interpretation of the Certainty-Equivalence Approach for PAC Reinforcement Learning with a Generative Model

Kalyanakrishnan, Shivaram, Shah, Sheel, Guguloth, Santhosh Kumar

arXiv.org Machine LearningJan-5-2025

Reinforcement learning (RL) enables an agent interacting with an unknown MDP $M$ to optimise its behaviour by observing transitions sampled from $M$. A natural entity that emerges in the agent's reasoning is $\widehat{M}$, the maximum likelihood estimate of $M$ based on the observed transitions. The well-known \textit{certainty-equivalence} method (CEM) dictates that the agent update its behaviour to $\widehat{\pi}$, which is an optimal policy for $\widehat{M}$. Not only is CEM intuitive, it has been shown to enjoy minimax-optimal sample complexity in some regions of the parameter space for PAC RL with a generative model~\citep{Agarwal2020GenModel}. A seemingly unrelated algorithm is the ``trajectory tree method'' (TTM)~\citep{Kearns+MN:1999}, originally developed for efficient decision-time planning in large POMDPs. This paper presents a theoretical investigation that stems from the surprising finding that CEM may indeed be viewed as an application of TTM. The qualitative benefits of this view are (1) new and simple proofs of sample complexity upper bounds for CEM, in fact under a (2) weaker assumption on the rewards than is prevalent in the current literature. Our analysis applies to both non-stationary and stationary MDPs. Quantitatively, we obtain (3) improvements in the sample-complexity upper bounds for CEM both for non-stationary and stationary MDPs, in the regime that the ``mistake probability'' $\delta$ is small. Additionally, we show (4) a lower bound on the sample complexity for finite-horizon MDPs, which establishes the minimax-optimality of our upper bound for non-stationary MDPs in the small-$\delta$ regime.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2501.02652

Genre:

Workflow (0.93)
Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Towards Safe Policy Improvement for Non-Stationary MDPs

Neural Information Processing SystemsOct-10-2024, 10:27:48 GMT

algorithm, non-stationary mdp, safe policy improvement

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.11)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

Zhang, Ziyi, Nakahira, Yorie, Qu, Guannan

arXiv.org Artificial IntelligenceSep-12-2024

Policy design of non-stationary Markov Decision Processes (MDPs) has always been challenging due to the time-varying system dynamics and rewards, so the learner usually suffers from uncertainties of future rewards and transitions. Fortunately, exogenous predictions are available in many applications. For example, in energy systems, look-ahead information is available in the form of renewable generation forecasts and demand forecasts Amin et al. [2019]. It is intuitive to design an algorithm that controls the energy system by utilizing that information to concentrate energy usage in the time frame with the lowest energy price and lower the overall energy cost. To give another example, smart servers can make predictions of future internet traffic from historical data Katris and Daskalaki [2015]. Given that the server tries to minimize the average waiting time of all tasks, if there is only light traffic, the average waiting time will be most reduced by only using the fastest server. However, if the smart server forecasts that there will be heavy traffic in the future, all servers should work to reduce the length of the queue. However, although policy adaptation in a time-varying environment has been extensively studied [Auer et al., 2008; Richards et al., 2021; Zhang et al., 2024; Gajane et al., 2018], they do not typically take advantage of exogenous predictions.

algorithm, denote, prediction, (15 more...)

arXiv.org Artificial Intelligence

2409.08434

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Energy > Renewable (0.66)
Energy > Power Industry (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

Add feedback

Non-stationary Reinforcement Learning under General Function Approximation

Feng, Songtao, Yin, Ming, Huang, Ruiquan, Wang, Yu-Xiang, Yang, Jing, Liang, Yingbin

arXiv.org Artificial IntelligenceJun-1-2023

General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.

dimension, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2306.00861

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > Ohio (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimizing for the Future in Non-Stationary MDPs

Chandak, Yash, Theocharous, Georgios, Shankar, Shiv, White, Martha, Mahadevan, Sridhar, Thomas, Philip S.

arXiv.org Machine LearningSep-21-2020

Most reinforcement learning methods are based upon the key assumption that the transition dynamics and reward functions are fixed, that is, the underlying Markov decision process is stationary. However, in many real-world applications, this assumption is violated, and using existing algorithms may result in a performance lag. To proactively search for a good future policy, we present a policy gradient algorithm that maximizes a forecast of future performance. This forecast is obtained by fitting a curve to the counter-factual estimates of policy performance over time, without explicitly modeling the underlying non-stationarity. The resulting algorithm amounts to a non-uniform reweighting of past data, and we observe that minimizing performance over some of the data from past episodes can be beneficial when searching for a policy that maximizes future performance. We show that our algorithm, called Prognosticator, is more robust to non-stationarity than two online adaptation techniques, on three simulated problems motivated by real-world applications.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2005.08158

Country:

North America > Canada > Alberta (0.14)
North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Massachusetts (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Education (0.68)
Leisure & Entertainment (0.67)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback